The objectives for this week are manyfold. First, last week we talked about yaml files briefly, but I’ve never manipulated this metadata intentionally in the past. Second, after watching this talk by Desirée De Leon I wanted to attempt some of the principles that were used to create Teacups, giraffes and statistics:
Third, I wanted to further explore the TidyTuesday dataset on Penguins using the GGally package.
Let’s get started!
We got the good characters art down since it is provided by Dr. Allison Horst, along with data collected and made available by Dr. Kristen Gorman, and a nice package that was developed with Dr. Allison Hill: palmerpenguins, think of iris, but with penguins.
For the good play I will incorporate an interactive component to this Rmarkdown. Aside from using plotly for visualization, we could use the package learnr which present data/information in a format that has optimal tutorial elements (e.g. equations, videos, code exercise, quizzes, shiny components).
In the good design criteria I’ve incorporated div tips to make the document stand out a bit. Here’s a link on how to make them.
There are a few options incorporated into the YAML configuration.The header used was imported from an HTML file targeting a local image file of Iter penguins.
Available highlighting styles for code chunks can be listed with the following line in the terminal: pandoc --list-highlight-styles. For this file, I used zenburn. File themes can also be updated from the default using pre-packaged themes, or we can download R packages with additional themes, check out this blogpost, in this document I used simplex theme.
For all the div tips used here, I incorporated the colors from the Iter penguins artwork using color slurp and the Google font Indie Flower was imported to the CSS style file. The images within div tips are courtesy of Desirée De Leon.
We will use plotly for interactive plots, GGally for scatterplot matrix correlograms.
suppressPackageStartupMessages(library(tidyverse))
library(plotly)
library(skimr)
library(GGally)penguins<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv')skim(penguins)| Name | penguins |
| Number of rows | 344 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 5 |
| ________________________ | |
| Group variables |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| species | 0 | 1.00 | 6 | 9 | 0 | 3 | 0 |
| island | 0 | 1.00 | 5 | 9 | 0 | 3 | 0 |
| sex | 11 | 0.97 | 4 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| bill_length_mm | 2 | 0.99 | 43.92 | 5.46 | 32.1 | 39.23 | 44.45 | 48.5 | 59.6 | ▃▇▇▆▁ |
| bill_depth_mm | 2 | 0.99 | 17.15 | 1.97 | 13.1 | 15.60 | 17.30 | 18.7 | 21.5 | ▅▅▇▇▂ |
| flipper_length_mm | 2 | 0.99 | 200.92 | 14.06 | 172.0 | 190.00 | 197.00 | 213.0 | 231.0 | ▂▇▃▅▂ |
| body_mass_g | 2 | 0.99 | 4201.75 | 801.95 | 2700.0 | 3550.00 | 4050.00 | 4750.0 | 6300.0 | ▃▇▆▃▂ |
| year | 0 | 1.00 | 2008.03 | 0.82 | 2007.0 | 2007.00 | 2008.00 | 2009.0 | 2009.0 | ▇▁▇▁▇ |
The downloaded data is pretty clean, besides filtering missing data from the sex variable, I just tallied observations.
penguins_df<- penguins %>%
drop_na() %>%
mutate(year=as.factor(year))ggpairs(penguins_df)Click on legend label to remove observations from plots!
p<- ggplot(penguins_df, aes(flipper_length_mm, bill_length_mm, fill= species, color=species)) +
geom_point() +
geom_smooth(method='lm', formula= y~x) +
hrbrthemes::theme_ipsum() +
scale_fill_manual(values = c("#FF8000", "#C85BCA", "#0E7274")) +
scale_color_manual(values = c("#FF8000", "#C85BCA", "#0E7274"))
ggplotly(p, height = 800, width = 800)p2<- ggplot(penguins_df, aes(body_mass_g, island, color=species)) +
geom_point() +
facet_grid(~sex)+
hrbrthemes::theme_ipsum() +
scale_color_manual(values = c("#FF8000", "#C85BCA", "#0E7274"))
ggplotly(p2, height = 800, width = 800)Here’s some cool links!
On learnr:
https://desiree.rbind.io/post/2020/learnr-iframes/
https://bookdown.org/yihui/rmarkdown/learnr.html
https://rstudio4edu.github.io/rstudio4edu-book/learnr.html
On github:
https://htmlpreview.github.io/